42 research outputs found
An Analysis of Publication Venues for Automatic Differentiation Research
We present the results of our analysis of publication venues for papers on
automatic differentiation (AD), covering academic journals and conference
proceedings. Our data are collected from the AD publications database
maintained by the autodiff.org community website. The database is purpose-built
for the AD field and is expanding via submissions by AD researchers. Therefore,
it provides a relatively noise-free list of publications relating to the field.
However, it does include noise in the form of variant spellings of journal and
conference names. We handle this by manually correcting and merging these
variants under the official names of corresponding venues. We also share the
raw data we get after these corrections.Comment: 6 pages, 3 figure
Automatic Differentiation of Algorithms for Machine Learning
Automatic differentiation---the mechanical transformation of numeric computer
programs to calculate derivatives efficiently and accurately---dates to the
origin of the computer age. Reverse mode automatic differentiation both
antedates and generalizes the method of backwards propagation of errors used in
machine learning. Despite this, practitioners in a variety of fields, including
machine learning, have been little influenced by automatic differentiation, and
make scant use of available tools. Here we review the technique of automatic
differentiation, describe its two main modes, and explain how it can benefit
machine learning practitioners. To reach the widest possible audience our
treatment assumes only elementary differential calculus, and does not assume
any knowledge of linear algebra.Comment: 7 pages, 1 figur
Automated Generation of Cross-Domain Analogies via Evolutionary Computation
Analogy plays an important role in creativity, and is extensively used in
science as well as art. In this paper we introduce a technique for the
automated generation of cross-domain analogies based on a novel evolutionary
algorithm (EA). Unlike existing work in computational analogy-making restricted
to creating analogies between two given cases, our approach, for a given case,
is capable of creating an analogy along with the novel analogous case itself.
Our algorithm is based on the concept of "memes", which are units of culture,
or knowledge, undergoing variation and selection under a fitness measure, and
represents evolving pieces of knowledge as semantic networks. Using a fitness
function based on Gentner's structure mapping theory of analogies, we
demonstrate the feasibility of spontaneously generating semantic networks that
are analogous to a given base network.Comment: Conference submission, International Conference on Computational
Creativity 2012 (8 pages, 6 figures
Evolution of Ideas: A Novel Memetic Algorithm Based on Semantic Networks
This paper presents a new type of evolutionary algorithm (EA) based on the
concept of "meme", where the individuals forming the population are represented
by semantic networks and the fitness measure is defined as a function of the
represented knowledge. Our work can be classified as a novel memetic algorithm
(MA), given that (1) it is the units of culture, or information, that are
undergoing variation, transmission, and selection, very close to the original
sense of memetics as it was introduced by Dawkins; and (2) this is different
from existing MA, where the idea of memetics has been utilized as a means of
local refinement by individual learning after classical global sampling of EA.
The individual pieces of information are represented as simple semantic
networks that are directed graphs of concepts and binary relations, going
through variation by memetic versions of operators such as crossover and
mutation, which utilize knowledge from commonsense knowledge bases. In
evaluating this introductory work, as an interesting fitness measure, we focus
on using the structure mapping theory of analogical reasoning from psychology
to evolve pieces of information that are analogous to a given base information.
Considering other possible fitness measures, the proposed representation and
algorithm can serve as a computational tool for modeling memetic theories of
knowledge, such as evolutionary epistemology and cultural selection theory.Comment: Conference submission, 2012 IEEE Congress on Evolutionary Computation
(8 pages, 7 figures
Using Synthetic Data to Train Neural Networks is Model-Based Reasoning
We draw a formal connection between using synthetic training data to optimize
neural network parameters and approximate, Bayesian, model-based reasoning. In
particular, training a neural network using synthetic data can be viewed as
learning a proposal distribution generator for approximate inference in the
synthetic-data generative model. We demonstrate this connection in a
recognition task where we develop a novel Captcha-breaking architecture and
train it using synthetic data, demonstrating both state-of-the-art performance
and a way of computing task-specific posterior uncertainty. Using a neural
network trained this way, we also demonstrate successful breaking of real-world
Captchas currently used by Facebook and Wikipedia. Reasoning from these
empirical results and drawing connections with Bayesian modeling, we discuss
the robustness of synthetic data results and suggest important considerations
for ensuring good neural network generalization when training with synthetic
data.Comment: 8 pages, 4 figure
Towards 3D Retrieval of Exoplanet Atmospheres: Assessing Thermochemical Equilibrium Estimation Methods
Characterizing exoplanetary atmospheres via Bayesian retrievals requires
assuming some chemistry model, such as thermochemical equilibrium or
parameterized abundances. The higher-resolution data offered by upcoming
telescopes enables more complex chemistry models within retrieval frameworks.
Yet, many chemistry codes that model more complex processes like photochemistry
and vertical transport are computationally expensive, and directly
incorporating them into a 1D retrieval model can result in prohibitively long
execution times. Additionally, phase-curve observations with upcoming
telescopes motivate 2D and 3D retrieval models, further exacerbating the
lengthy runtime for retrieval frameworks with complex chemistry models. Here,
we compare thermochemical equilibrium approximation methods based on their
speed and accuracy with respect to a Gibbs energy-minimization code. We find
that, while all methods offer orders of magnitude reductions in computational
cost, neural network surrogate models perform more accurately than the other
approaches considered, achieving a median absolute dex error <0.03 for the
phase space considered. While our results are based on a 1D chemistry model,
our study suggests that higher dimensional chemistry models could be
incorporated into retrieval models via this surrogate modeling approach.Comment: 22 pages, 14 figures, submitted to PSJ 2022/11/22, revised 2023/3/7,
accepted 2023/3/23. Updated to add Zenodo link to Reproducible Research
Compendiu
Toward 3D retrieval of exoplanet atmospheres: assessing thermochemical equilibrium estimation methods
Characterizing exoplanetary atmospheres via Bayesian retrievals requires assuming some chemistry model, such as thermochemical equilibrium or parameterized abundances. The higher-resolution data offered by upcoming telescopes enable more complex chemistry models within retrieval frameworks. Yet many chemistry codes that model more complex processes like photochemistry and vertical transport are computationally expensive, and directly incorporating them into a 1D retrieval model can result in prohibitively long execution times. Additionally, phase-curve observations with upcoming telescopes motivate 2D and 3D retrieval models, further exacerbating the lengthy runtime for retrieval frameworks with complex chemistry models. Here we compare thermochemical equilibrium approximation methods based on their speed and accuracy with respect to a Gibbs energy-minimization code. We find that, while all methods offer orders-of-magnitude reductions in computational cost, neural network surrogate models perform more accurately than the other approaches considered, achieving a median absolute dex error of <0.03 for the phase space considered. While our results are based on a 1D chemistry model, our study suggests that higher-dimensional chemistry models could be incorporated into retrieval models via this surrogate modeling approach
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Tricks from Deep Learning
The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods constitute a breakthrough, allowing computational structures which are quite wide, very deep, and with an enormous number and variety of free parameters to be effectively optimized. The result now dominates much of practical machine learning, with applications in machine translation, computer vision, and speech recognition. Many of these methods, viewed through the lens of algorithmic differentiation (AD), can be seen as either addressing issues with the gradient itself, or finding ways of achieving increased efficiency using tricks that are AD-related, but not provided by current AD systems.
The goal of this paper is to explain not just those methods of most relevance to AD, but also the technical constraints and mindset which led to their discovery. After explaining this context, we present a "laundry list" of methods developed by the deep learning community. Two of these are discussed in further mathematical detail: a way to dramatically reduce the size of the tape when performing reverse-mode AD on a (theoretically) time-reversible process like an ODE integrator; and a new mathematical insight that allows for the implementation of a stochastic Newton's method